Parameterizable clip instruction and method of performing a clip operation using the same

ABSTRACT

A parameterizable clip instruction for SIMD microprocessor architecture and method of performing a clip operating the same. A single instruction is provided with three input operands: a destination address, a source address and a controlling parameter. The controlling parameter includes a range type and a range specifier. The range type is a multi-bit integer in the operand that is used to index a table of range types. The range specifier plugs into the range type to define a range. The data input at the source address is clipped according to the controlling parameters. The instruction is particularly suited to video encoding/decoding applications where interpolations or other calculations, lies outside the maximum value and that final result will have to be clipped to saturation value, for example, the maximum pixel value. Signed and unsigned clipping ranges may be used that are not only powers of two.

CROSS REFERENCE TO RELATED APPLICATIONS

This application claims priority to U.S. Provisional Patent Application No. 60/721,108 titled “SIMD Architecture and Associated Systems and Methods,” filed Sep. 28, 2005, the disclosure of which is hereby incorporated by reference in its entirety.

FIELD OF THE INVENTION

The invention relates generally to embedded microprocessor architectures and more specifically to a clip instruction for SIMD microprocessor architectures and a method of performing a clip operation using such a clip instruction.

BACKGROUND OF THE INVENTION

Single instruction multiple data (SIMD) architectures have become increasingly important as demand for video processing in electronic devices has increased. The SIMD architecture exploits the data parallelism that is abundant in data manipulations often found in media related applications, such as discrete cosine transforms (DCT) and filters. Data parallelism exists when a large mass of data of uniform type needs the same instruction performed on it. Thus, in contrast to a single instruction single data (SISD) architecture, in a SIMD architecture a single instruction may be used to effect an operation on a wide block of data. SIMD architecture exploits parallelism in the data stream while SISD can only operate on data sequentially.

An example of an application that takes advantage of SIMD is one where the same value is being added to a large number of data points, a common operation in many media application. One example of this is changing the brightness of a graphic image. Each pixel of the image may consist of three values for the brightness of the red, green ad blue portions of the color. To change the brightness, the R, G and B values, or alternatively the YUV values are read from memory, a value is added to it, and the resulting value is written back to memory. A SIMD processor enhances performance of this type of operation over that of a SISD processor. A reason for this improvement is that that in SIMD architectures, data is understood to be in blocks and a number of values can be loaded at once. Instead of a series of instructions to incrementally fetch individual pixels, a SIMD processor will have a single instruction that effectively says “get all these pixels” Another advantage of SIMD machines is multiple pieces of data are operated on simultaneously. Thus, a single instruction can say “perform this operations on all the pixels.” Thus, SIMD machines are much more efficient in exploiting data parallelism than SISD machines.

SIMD architectures have particular promise for video encoding/decoding applications where many repetitive numerical computations must be performed on relatively large blocks of data. Numerical computation algorithms, such as those common in video encoding/decoding, often require results to be clipped to be within a specified range of values. For example, in video processing, a system will have a maximum pixel depth depending on the system's resolution. If the value of an intermediate calculation result, such as interpolation or other calculation, lies outside the maximum value the final result will have to be clipped to the saturation value, for example, the maximum pixel value.

Clipping is typically implemented in software using a sequence of instructions that first test the intermediate value and then conditionally assign the final value, for example, if value>maximum, then value=maximum. Such a software clipping implementation incurs a high overhead due to the number of calculations required to test each value. The sequential nature of a software implementation makes it very difficult to be optimized in processors designed to exploit instruction level parallelism, such as, for example, SISD reduced instruction set (RISC) machines or very long instruction word (VLIW) machines. Some processors do implement clipping at the hardware level using specialized processor instructions, however, the clipping ranges of these instructions are fixed to some value, typically a power of two.

SUMMARY OF THE INVENTION

Thus, there exists a need for a SIMD microprocessor architecture that ameliorates at least some of the above-noted deficiencies of conventional systems. At least one embodiment of the invention may provide a parameterizable microprocessor clip instruction. The parameterizable microprocessor clip instruction according to this embodiment may comprise a destination register operand, a source register operand of a value to be clipped, and a second source operand containing the control parameter specifying the manner in which clipping is to be performed, wherein the control parameter comprises a range type and range specifier. It should be appreciated that in the context of a SIMD machine, the source operand containing the “value” to be clipped is really referring to the values to be clipped because a 128-bit register is used to hold 8 16-bit values to be clipped by a single instruction.

Accordingly, at least one embodiment of the invention may provide a method of causing a microprocessor to perform a clip operation. The method according to this embodiment may comprise providing an assembly instruction to the microprocessor, the instruction comprising an input address, an output address and a controlling parameter, decoding the instruction with logic in the microprocessor, retrieving a data input from the input address, determining a specific clip operation based on the controlling parameter, performing the clip operation on the data input, and writing the result to output address.

Another embodiment of the invention may provide a method of performing a clip operation with a single parameterizable assembly language-based clip instruction executing on a microprocessor. The method of performing a clip operation with a single parameterizable assembly language-based clip instruction executing on a microprocessor may comprise specifying a source address of a data input, a destination address of a clipped output and a controlling parameter in a single instruction, obtaining the data input at the source address, performing the clip operation on the data input in accordance with the controlling parameter, and storing the result at the destination address.

At least one other embodiment of the invention may provide a parameterizable assembly language program instruction for performing a clip operation in a video processing application. The parameterizable assembly language program instruction according to this embodiment may comprise an instruction name for a particular microprocessor instruction, a first instruction input operand comprising a destination register address to write an instruction result, a second instruction input operand comprising a source register address containing a value to be clipped, and a third instruction input operand comprising a controlling parameter.

These and other embodiments and advantages of the present invention will become apparent from the following detailed description, taken in conjunction with the accompanying drawings, illustrating by way of example the principles of the invention.

BRIEF DESCRIPTION OF THE DRAWINGS

In order to facilitate a fuller understanding of the present disclosure, reference is now made to the accompanying drawings, in which like elements are referenced with like numerals. These drawings should not be construed as limiting the present disclosure, but are intended to be exemplary only.

FIG. 1 is a diagram illustrating the components of a parameterizable clip instruction for either SISD or SIMD processor architectures according to at least one embodiment of the invention;

FIG. 2 illustrates the format of a 32-bit parameter input to the parameterizable clip instruction of FIG. 1 according to at least one embodiment of the invention;

FIG. 3 is a table illustrating the ways in which the parameters of the parameterizable clip instruction may be specified; and

FIG. 4 is a flow chart of an exemplary method of performing a clip operation with a parameterizable clip instruction according to at least one embodiment of the invention.

DETAILED DESCRIPTION

The following description is intended to convey a thorough understanding of the embodiments described by providing a number of specific embodiments and details involving microprocessor architecture and systems and methods for performing clip operations with a parameterizable clip instruction. It should be appreciated, however, that the present invention is not limited to these specific embodiments and details, which are exemplary only. It is further understood that one possessing ordinary skill in the art, in light of known systems and methods, would appreciate the use of the invention for its intended purposes and benefits in any number of alternative embodiments, depending upon specific design and other needs.

Referring now to FIG. 1, a diagram illustrating the components of a parameterizable clip instruction for either SISD or SIMD processor architectures according to at least one embodiment of the invention is provided. As discussed above, algorithms in numerical computations, such as those common in video encoding/decoding, often require results to be clipped to be within a specified range of values. For example, in video processing, a system will have a maximum pixel depth depending on the system's resolution. If the value of an intermediate calculation result, such as an interpolation or other calculation lies outside the maximum value the final result will have to be clipped to a saturation value, for example, the maximum pixel value.

Conventionally, clipping is implemented in software using a sequence of instructions that first test the intermediate value and then conditionally assign the final value, for example, if value>maximum, then value=maximum. Such a software clipping implementation incurs a high overhead due to the number of calculations required to test each value. The sequential nature of a software implementation makes it very difficult to be optimized in processors designed to exploit instruction level parallelism, such as, for example, SISD reduced instruction set (RISC) machines or very long instruction word (VLIW) machines. Some processors do implement clipping at the hardware level using specialized processor instructions, however, the clipping ranges of these instructions are fixed to some value, typically a power of two. Therefore, various embodiments of this invention provide a parameterizable clip instruction for a microprocessor that enables adjustment of clipping parameters.

Referring to FIG. 1, the instruction 100 labeled “VBLCIP” contains three elements, rd, rb and rc. Rb and rd are the source and destination register addresses respectively. That is, rb is the register address of the value to be clipped and rd is the register address where the clipped value is to be written. Rc is the controlling parameter for the instruction. The value of rc dictates how the value located at address rb will be clipped. This instruction permits 8 16 bit values to be clipped within the range specified by the control parameter rc.

FIG. 2 illustrates the format of controlling parameter rc in the form of a 32-bit operand and FIG. 3 is a table illustrating the ways in which the parameters of the parameterizable clip instruction may be specified. As seen from these Figures, in this example, the input rc is a 32 bit input. However, it should be appreciated that depending upon the native word size of the processor, rc may be 16, 32, 64, 128 or other bit size. In various embodiments, the most significant 16 bits, that is, bits 31 to 16 are unused as seen in the table. In various embodiments, bits 15 and 14 are reserved for the range type, while bits 13-0 are used for the range specifier.

In the example of FIG. 3, four range types are available. Specifically, range types of [0, 2^(N)−1], [−N, N], [−2^(N), 2^(N)−1] and [0, N] corresponding to 2-bit binary values 00, 01, 10 and 11. The remaining 14 least significant bits, bits 13 to bit 0 are used to represent N, the range specifier. These bits contain a binary number having a maximum value of 11111111111111 (16383). Thus, by using range type 01 or 11, ranges not limited to powers of two may be used.

In the table 110 of FIG. 3, the range specifier N is itself a parameter supplied to the VBCLIP instruction 100. The bit type RT specifies one of the four possible ways the clipping range can be defined using the range specifier N. Range types 00 and 10 are designed to work with unsigned and signed clipping ranges respectively, while types 01 and 11 are designed to work with signed and unsigned clipping ranges that are not powers of two. The VBCLIP instruction is therefore a highly flexible processor implementation of clipping. In addition, though the example of FIGS. 2 and 3 describes VBCLIP as an SISD instruction, the instruction syntax can easily be extended to SIMD architectures in which both registers rb and rc are vector registers. In this case, clipping, as specified in rc, is applied to each slice of the vector register rb with the results assigned to the corresponding slice in rd. An additional advantage of a SIMD version of the clipping instruction is bypassing the data dependent sequential nature of clipping operations that is awkward to implement in parallel machines.

Referring now to FIG. 4, this figure is flow chart an exemplary method for performing a clip operation with a parameterizable clip instruction according to at least one embodiment of the invention. The method begins in step 200 and proceeds to step 205 where the clip instruction is fed to the microprocessor pipeline. As discussed above in the context of FIGS. 1-3, in various embodiments, the instruction comprises an instruction taking the form of a name and three input operands: a destination address, a source address and a controlling parameter. Then, in step 210, the data to be operated on is fetched from the source address specified in the instruction. Also, in step 215, the range type indicated in the instruction is referenced to determine the actual range after decoding the instruction. In various embodiments, the range type is represented by two bits of the input operand's controlling parameter rc. In various embodiments, a table is stored in a memory register of the processor that maintains a list of the range types indexed by the two-bit code. In step 220, the range specifier is extracted from the instruction and using the range type, a range is determined. In step 225, the value fetched in step 210 is clipped in accordance with the range determined in step 220. In step 230 the result is written to the destination address specified in the destination address input operand rd of the instruction. Operation of the method stops in step 235.

The embodiments of the present inventions are not to be limited in scope by the specific embodiments described herein. For example, although many of the embodiments disclosed herein have been described with reference to systems and methods for performing clip operations with a parameterizable clip instruction, the principles herein are equally applicable to other aspects of microprocessor design and function. Indeed, various modifications of the embodiments of the present inventions, in addition to those described herein, will be apparent to those of ordinary skill in the art from the foregoing description and accompanying drawings. Thus, such modifications are intended to fall within the scope of the following appended claims. Further, although some of the embodiments of the present invention have been described herein in the context of a particular implementation in a particular environment for a particular purpose, those of ordinary skill in the art will recognize that its usefulness is not limited thereto and that the embodiments of the present inventions can be beneficially implemented in any number of environments for any number of purposes. Accordingly, the claims set forth below should be construed in view of the full breath and spirit of the embodiments of the present inventions as disclosed herein. 

1. A method of causing a microprocessor to perform a clip operation comprising: providing an assembly instruction to the microprocessor, the instruction comprising an input address, an output address and a controlling parameter; decoding the instruction with logic in the microprocessor; retrieving a data input from the input address; determining a specific clip operation based on the controlling parameter; performing the clip operation on the data input; and writing the result to output address.
 2. The method according to claim 1, wherein determining a clip operation based on the controlling parameter comprises decoding the controlling parameter into a range type and a range specifier.
 3. The method according to claim 2, wherein the range type is a type selected from the group consisting of a [0, 2^(N)−1], [−N, N], [−2 ^(N), 2^(N)−1] and [0, N], where N is the range specifier.
 4. The method according to claim 2, wherein decoding the controlling parameter into a range type comprises performing a table look up of a X-bit number in the controlling parameter where 2^(X) is the number of range types.
 5. The method according to claim 2, wherein performing the clip operation comprises clipping the input value according to the range type and range specifier.
 6. The method according to claim 1, wherein in the input address and output addresses comprise vector registers.
 7. A method of performing a clip operation with a single parameterizable assembly language-based clip instruction executing on a microprocessor comprising: specifying a source address of a data input, a destination address of a clipped output and a controlling parameter in a single instruction; obtaining the data input at the source address; performing the clip operation on the data input in accordance with the controlling parameter; and storing the result at the destination address.
 8. The method according to claim 7, wherein specifying a controlling parameter comprises specifying a Y bit number including a range type and a range specifier, where Y is an integer power of
 2. 9. The method according to claim 8, wherein the range type is a is a type selected from the group consisting of a [0, 2^(N)−1], [−N, N], [−2^(N), 2^(N)−1] and [0, N], where N is the range specifier.
 10. The method according to claim 9, wherein the range specifier is a positive integer.
 11. The method according to claim 8, wherein performing the clip operation in accordance with the controlling parameter comprises clipping the data input based on the instruction's range specifier and range type.
 12. The method according to claim 7, wherein the source address and destination address comprise vector registers and performing the clip operation comprises performing the clip operation in accordance with the controlling parameter on each slice of the source address vector registers and storing the results at a corresponding slice of the destination address vector register.
 13. A parameterizable assembly language program instruction for performing a clip operation in an video processing application comprising: an instruction name for a particular microprocessor instruction; a first instruction input operand comprising a destination register address to write an instruction result; a second instruction input operand comprising a source register address containing a value to be clipped; and a third instruction input operand comprising a controlling parameter.
 14. The instruction according to claim 13, wherein the controlling parameter comprises a Z-bit number wherein Z is an integer power of
 2. 15. The instruction according to claim 13, wherein the controlling parameter includes a range type and a range specifier.
 16. The instruction according to claim 15, wherein the range type is a type selected from the group consisting of a [0, 2^(N)−1], [−N, N], [−2^(N), 2^(N)−1] and [0, N], where N is the range specifier.
 17. The instruction according to claim 16, wherein N is a positive integer.
 18. The instruction according to claim 13, wherein the destination register address and the source register address are vector register addresses. 